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We study various aspects of extracting spectral information from time correlation functions of 
lattice QCD by means of Bayesian inference with an entropic prior, the maximum entropy method 
(MEM). Correlator functions of a heavy-light meson- meson system serve as a repository for lattice 
data with diverse statistical quality. Attention is given to spectral mass density functions, inferred 
from the data, and their dependence on the parameters of the MEM. We propose to employ simulated 
annealing, or cooling, to solve the Bayesian inference problem, and discuss practical issues of the 
approach. 
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I. INTRODUCTION 

Numerical simulations of quantum chromodynamics 
(QCD) on a Euclidean space-time lattice provides access 
to mass spectra of hadronic systems through the analy- 
sis of time correlation functions. In theory the latter are 
linear combinations of exponential functions 



C(t,t Q ) = Zie-^'^ + Z 2 e- E ^ t - to '> + 



(1) 



where the E n are the excitation energies of the system 
and the strength coefficients 



|<n|$(t )|0>| 



(2) 



are matrix elements of some vacuum-subtracted operator 
$(t ) = ®(to)-(0\3>(t )\0) between the vacuum |0) and a 
ground or excited state \n),n > 0. In practice the expo- 
nential model (Q) is fitted to noisy numerical simulation 
'data'. The statistical quality of simulation data rarely 
is good enough for the two-exponential fit ([!]) to succeed. 
It is common practice to look at the large-i behavior of 
the correlation function C{t, to) in a i-interval where it is 
dominated by only one exponential, with the lowest en- 
ergy, and then make a one-parameter fit to a plateau of 
the effective-mass function p. e s(t,to) = —d\nC(t,to)/dt. 
Possible discretizations are 



He«,o(t, to) 
Meff,i(i, h) 

Meff,2(i, t ) 
£*eff,3(Mo) 



In 



( C(t + l,t ) 
V C(Mo) 



(3a) 
(3b) 
(3c) 



C(t+l,t ) ^ 
C{t,to) ~ 
C(t + l,to)-C(t-l,to) 

2C(f,t ) 
~ - sinh(TO cffi2 ) 

C(t + l,to) + C(t-l,t )-2C(t,t Q ) 
C(t,t ) 

~ 2(cosh(m e ff, 3 ) - 1) ■ (3d) 



The expressions after the ~ are the values of p, e s for a 
pure plateau of mass m e ff. The procedure implies the 
selection of consecutive time slices t = t\ . . . t 2 for which 
Mcff = const, within errors, and an appropriate fit. The 
selection of this, so-called, plateau is a matter of judg- 
ment. A condition for reliable results is that the corre- 
lation function (Q) is dominated by just one exponential 
term, usually the ground state. The latter can be en- 
hanced by the use of smeared operators Q and fuzzy link 
variables This analysis procedure discourages con- 
sideration of excited states. In fact it will only produce 
reliably results if those are suppressed. Workarounds in- 
volve diagonalization of a correlation matrix of several 
operators or variational techniques Sj. Those however, 
still rely on plateau selection without utilizing the infor- 
mation contained in the entire available time-slice range 
of a correlation function. 

As lattice simulations of QCD now aim at excited 
hadron states, N*'s for example ||, ^j, this situation is 
unsatisfactory. Alternative methods employing Bayesian 
inference [Q are a viable option. The maximum entropy 
method (MEM) , which involves a particular choice of the 
Bayesian prior probability, falls in this class. Bayesian 
statistics || is a classic subject with a vast range of ap- 
plications. However, application within the context of 
lattice QCD is relatively new §, [u], |ll], [u|. 

In this work we report on our experience using the 
MEM for extracting spectral mass density functions p(oj) 
from lattice-generated time correlators 



C(t,to) = / dio p(ui)e 



3 -w(t-i ) 



(4) 
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where a discrete set of time slices t is understood. Dis- 
cretization of the w-integral with reasonably fine resolu- 
tion leads to an ambiguous problem where the number 
of parameters values p{oj) is (typically much) larger than 
the number of lattice data C(t,to)- In the MEM an en- 
tropy term involving the spectral density is used as a 
Bayesian prior to infer p{uj) from the data. 

We here apply MEM analysis to sets of lattice corre- 
lation functions of a meson-meson system. Those par- 
ticular simulations are aimed at learning about mech- 
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anisms of hadronic interaction. This will be discussed 
separately jL4j. The lattice data generated within that 
project involve local and nonlocal operators. They ex- 
hibit a wide range of statistical quality from 'very good' 
to 'marginally acceptable'. 

Our focus here is to utilize those data as a testing 
ground for Bayesian MEM analysis. In contrast to other 
works we employ simulated annealing to the solution of 
the Bayesian inference problem. The main aim of this 
work is to explore the feasibility of this approach for ex- 
tracting masses from a lattice simulation using realistic 
lattice data, including excitations. For the most part this 
translates into studying the sensitivity of the method to 
to its native parameters. 



II. BAYESIAN INFERENCE FOR CURVE 
FITTING 

From a Bayesian point of view the spectral density 
function p in (|4|) is a random variable subject to a certain 
probability distribution functional V[p\. Solution of the 
curve fitting problem consists in finding the function p 
which maximizes the conditional probability V\p «— C], 
the 'posterior probability, given a 'measured' data set C. 
Computation of p is then based on Bayes' theorem M 

V[p^C}V[C}=V[C^p}V[p], (5) 

also known as 'detailed balance' in a different context. 
The functional V[C], the evidence, gives the probability 
of measuring a data set C. The conditional probabil- 
ity V[C <— p], the likelihood function, determines the 
probability of measuring C given a spectral function p. 
Finally V\p\, the Bayesian prior, defines a constraint on 
the spectral density function p. Its choice is a matter of 
judgment. Ideally, the prior should reflect the physics 
known about the system, for example an upper limit 
on the hadronic mass scale. The posterior probability 
is the product of the likelihood function and the prior 
V[p <- C] = V[C <- p]V[p]/V[C], where the evidence 
merely plays the role of a normalization constant |Q . In- 
deed, the normalization condition j[dp\P[p <— C] = 1 
applied to (§) gives V[C] = f[dp]V[C «- p]V[p]. Thus, 
for a fixed C, we have 

P\p<-C\ocV[C^p]r\p]. (6) 

The curve fitting problem requires the product of the 
likelihood function and the prior function. 



A. Spectral density 

Our lattice data come from correlation functions built 
from heavy-light meson-meson operators 

= + V 2 <$>2 , (7) 



where $i and $2 involve local and non-local meson- 
meson fields, respectively, at relative distance r, and v 
are some coefficients [|l5[ |l6j . On a finite lattice the cor- 
responding correlator C v (t,t n ) = (<&l(t)<& v (t )) , where 
$ = $ — ($), has a purely discrete spectrum 

C v (t,to) = E K»l*i.(*o)|0>| 2 e- w »< t -*°> . (8) 

Here \n) denotes a complete set of states with energies 
uj n , some of which may be negative due to periodic lat- 
tice boundary conditions and operator structure. Our 
normalization conventions for forward and backward go- 
ing propagators are determined by defining 

exp T (w, t) = G(u)e- Ut + Q(-cj)e +a( - T -^ , (9) 

where < t < T, and O denotes the step function. We 
then expect the lattice data to fit the following model 

/+00 
dujp T (ui)exp T (ui,t -t ) , (10) 
-00 

where pt{^) is a spectral density function, defined for 
positive (forward) and negative (backward) frequencies. 
The requirement that the model be exact, F(pT\t,to) — 
C v (t,t ), leads to 

Pt {lo) = ^8{uj-Lu n )\{n\<S> v {t )\Q)\ 2 x 

[©K) + e(-^„)e- w " T ]. (11) 

Thus a discrete sum over <5-peaks is the theoretical form 
of the spectral function. Our objective is to compute 
Pt(w) from lattice data using Bayesian inference. 



B. Likelihood function 

Toward this end we proceed to construct the likelihood 
function. The lattice data come in the form of an average 
over Njj gauge configurations 

1 N v 

C v (t,t ) = Tr Y l C v (U n \t,t ), (12) 

where C v (U n \t, to) is the value of an operator, in this case 
<tl(t)$ v (to), in one gauge field configuration U n . Corre- 
lation function data on different time slices are stochas- 
tically dependent. Their errors are described by the co- 
variance matrix 

, Nu 

Tv(t U t 2 ) = —Y^ (^(^o) ~ C v (U n \tl,toj) X 

^ 71—1 

(c v (t 2 ,to)-C v (U n \t 2 ,t )) (13) 
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The x 2 -distance of the spectral model (10) from the lat- 
tice data then is 



X 



(c v (t u t Q ) - F(p T \ti,t )) Tv\ti,t 2 ) x 
(C v (t 2 ,t ) - F(p T \h,t j) ■ (14) 



For numerical work a discretization scheme of the w- 
integral in ([l0|) is required. Our choice is 



F{p T \t,t )^ ^2 Pk exp T (u fc ,t - t ) (15) 

k=K- 

where u>k = Awfc, Alu is an appropriate (small) interval, 
Pk = Acupriuik), and K- < < K + . 

The likelihood function V[C <— p] describes the proba- 
bility distribution of the data C given a certain parameter 
set p. If we imagine that the data are obtained by a large 
number of measurements, at fixed p, then the probabil- 
ity distribution for C is Gaussian by virtue of the central 
limit theorem, 



V[C - p] 



„-x72 



(16) 



This is the standard argument for employing the above 
form of the likelihood function in the context of Bayesian 
inference [0, p7| . 



C. Entropic prior 

In case some information is available about the physics 
of the system it can be used to constrain the parameter 
space of the model. This is the role of the Bayesian prior. 
In the standard approach plateau methods are a severe 
form of imposing restrictions. A two-exponential fit (Q), 
if feasible, is less constraining. In a Bayesian context it is 
possible to gradually increase the number of exponentials 
until convergence is reached. This is a strategy advocated 
in jn|, see also There, the model for the correla- 

tion function is A n e~ Ent , initially with small num- 
ber of terms, which is then constrained by the Bayesian 
prior _ e -EJ(A,-In)V2^+(£»-B») J /24„]. The quanti _ 

ties A n , <jA n i E n , OE n are input. Their choice is inspired 
by prior knowledge about the physics of the system. 

On the other hand, there is usually no a priory infor- 
mation about the location and the strengths of the peaks 
in the mass spectrum. The view that only minimal in- 
formation is available about the spectral density function 
can also be implemented in the Bayesian prior. The in- 
formation content, in the sense of 11191 EOT ElL is measured 



by the entropy S = — J2k Pk m (Pfc/ m )> on some scale m. 
Rather, a commonly used variant is the Shannon-Jaynes 
entropy j§ 



S[p]= V" ( Pk - m k - p k \n — 



(17) 



Note that pk > 0, according to (|1 1|). The configura- 
tion m = {mk : K~ < k < K + } is called the default 
model. We have S < 0, Vp, while 5 = <^=> p = m. 
The default model is a unique absolute maximum of S. 
Choosing the prior probability as 



V[p] 



(18) 



entails that V\p\ is maximal in the absence of informa- 
tion about p. An argument for (18) can be found in [jf). 
The entropy strength a and the default model m are pa- 
rameters. 



D. Computing the spectral density 

With (|l6|) and (|l^) the posterior probability (||) be- 
comes 



V[p «- C) oc e -(* 2 / 2 - a5 > 



(19) 



We wish to maximize V\p <— C] with respect to p, at 
fixed C. It can be shown that both x 2 [p] and — S[p] are 
convex functions of p = {pk ■ K- < k < K + }. Thus 



W[p] = x 2 /2 - aS 



(20) 



has a unique absolute minimum. The functional W[p] 
is nonlinear and maximally nonlocal since all degrees of 
freedom p k are coupled via the covariance matrix ( |l3| ) in 
(|l4|). To find the minimum of W[p] one option is to use 



singular value decomposition (SVD), see \ \2 



In keeping with the Bayesian probabilistic interpreta- 
tion of p an attractive alternative is to employ stochastic 
methods to solve the optimization problem W\p] = min. 



In this work we employ simulated annealing |22 , equiv 



alently known as cooling. The algorithm is based on the 
partition function 



[dp]e 



-0wW[p] 



(21) 



It involves the generation of equilibrium configurations 
p while gradually increasing (3w from an initially small 
value, following some annealing schedule. The latter is 
subject to experimentation. We have used the power law 



f3w(n) = (f3 1 -f3 )(n/Ny +/3 Q 



(22) 



with annealing steps n = . . . N between an initial (3q 
and a final (3\. 

A standard Metropolis algorithm was used to generate 
configurations p with the distribution in (^l|). In consec- 
utive sweeps local updates were done by multiplying the 
spectral parameters with positive random numbers, pk — » 
xpk ■ Some experimenting showed that T-distributed ran- 
dom deviates of order two, p a (x) = x a ~ 1 e~ x /T(a), a = 2, 
work quite efficiently at an acceptance rate centered at 
about 50%. 
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III. RESULTS 

All simulations were done on an L 3 x T = 10 3 x 30 
lattice. The gauge field and fermion actions are both 
anisotropic, with bare aspect ratio of a s /a t = 3, and tad- 
pole improved. The gauge field action is that of [|| with 
(3 = 2.4, leading to a spatial lattice constant of a s ~ 
0.25fm, a^ 1 ~ 800MeV. For the light fermions we use a 
clover improved Wilson action. The hopping parameter 
k = 0.0679 results in a mass ratio m^/m p ~ 0.75. Fol- 
lowing H only spatial directions are improved with spa- 
tial tadpole renormalization factors u s = (d) 1 / 4 , while 
u t = 1 in the time direction. Clover terms involving time 
directions are omitted. 



Some guidance for a reasonable w-discretization (15), 
of (|h]), may be derived from the physical value of the 
lattice constant at, and the time extent Tat of the lat- 
tice. Admissible lattice energies thus lie approximately 
between ir/a t w 7.5GeV and Tr/Ta t ~ 250MeV, or w 3 
and ~ 0.1 in units of a// 1 . In practice these are somewhat 
extreme bounds. Typical hadronic excitation energies 
are much less than 7r/a f « 7.5GeV. The lower bound, 
on the other hand, may well be ignored as a criterion 
for choosing the discretization interval Aw, because the 
theoretical form of p is a superposition of (5-peaks. Thus 
the resolution Aw should be small, in fact much smaller 
than ks 0.1. A reasonable lower bound is the likely sta- 
tistical error on spectral masses. For most of the results 
presented here Aui = 0.04, and K_ = -40, K+ = +80, 
leading to — 1.6 < u < +3.2, were used with (jig). 

With the annealing schedule (^), we have used N = 
2048 cooling steps, at 128 sweeps per temperature, start- 
ing at ft = 1.0 x 10~ 5 (3 and ending at ft = 1.0 x 10 +5 /3, 
with a geometric average of (3 = 1.0 x 10 +3 . These choices 
are an outcome of experimentation. With 7 ~ 16.61 in 
( p2| ) about half of the cooling steps operate in the re- 
gions fty(n) < (3 and (3w{n) > (3, respectively. The 
average value (3 is such that f3wW[p] fluctuates about 
one at around N/2 cooling steps. With the final anneal- 
ing temperature kept constant, ftp = ft, an additional 
1024 steps were done keeping 16 configurations p in order 
to measure cooling fluctuations. 

Results are robust within reasonable changes of the an- 
nealing schedule parameters, they were used throughout 
this work. 



A. Entropy weight dependence 



Pieces of those were arbitrarily matched and enhanced 
in order to exhibit a multi-exponential correlation func- 
tion. While Cx(t, to) bears no physical significance, its 
rich structure provides a useful laboratory for testing the 
a dependence of the spectral density function. 

In Fig. ^ we show a sequence of six pairs of Bayesian fits 
to the mock correlator Cx(i, to) and the corresponding 
spectral densities p for a wide range of entropy weights a. 
The stability of the global structure of p while a changes 
from 1.4 x 10 -2 to 1.4 x 10+ 7 is most notablef25|. As 
a becomes larger entire peaks vanish starting with the 
smallest one. The reason is that the annealing action 
( pp| ) gradually loses memory of the data, contained in x 2 , 
in favor of the entropy. The fit at a — 1.4 x 10 +7 exhibits 
the onset of a smoothing of the micro structure, starting 
with the largest peak. This is the signature of emerging 
entropy dominance over the data. In practice this situa- 
tion should be avoided. In our case entropy strengths in 
the region a < 10 +6 over eight orders of magnitude give 
stable consistent results. It has been proposed that spec- 
tral functions be integrated over a to avoid the parameter 
dependence Q| . Inspection of our results clearly indicates 
that averaging over a would be without consequence to 
the gross structure of p, only the micro structure would 
be affected. Even the region a > 10 +6 could be included, 
since the magnitude of p quickly becomes insignificant. 

In order to decide on a tuning criterion for a it is useful 
to monitor quantities like 



s/w 



-aS) 



f)w— *oo 



Y f 



s/x 2 



(23) 
(24) 



where (■■■}/3 w -> o refers to the annealing average mea- 
sured at the final cooling temperature, ft. We will re- 
fer to the above quantities as entropy loads. Those are 
shown in Fig. |[ It turns out that log(F) depends linearly 
on log(a) in the regions log(a) < +1 and log(a) < +4, for 
Yg/w an( i Xs/x 2 i respectively. (In fact Y w 6.2 x 10~ 4 a.) 
Beyond the linear region too much entropy is loaded into 
the annealing action W, leading to a smoothing of peaks, 
as seen in Fig. [l]. Empirically, the criterion emerging from 
this observation is to tune the entropy weight such that 
log(F) w —2 ± 1 within the linear region. The precise 
value of log(Y) is not important, also Y = Yg/w an d 
Y = Y S / X 2 work equally well. As is evident from Fig. |l| 
results are extremely robust against varying a. 



The extent to which the spectral density p depends on 
the value of the entropy weight parameter a, in (|20|), is a 
primary concern. We are interested in testing the a de- 
pendence for a case where both ground and excited states 
are prominently present in a time correlation function. 
For this reason we have constructed a mock correlator 
Cx{t,to). Its building blocks were the eigenvalues of the 
2x2 correlation matrix Cij(t,t Q ) — (&l(t)&j(to)) using 
the above mentioned local and non-local meson fields. 



B. Single- meson spectrum 

The correlation function c(t,to) — (ft (t)<f>(to)) of a 
single pseudoscalar heavy-light meson operator <p{t) = 
J2x Q A(%t)'Y5QA{&b) delivers high quality data in this 
simulation. We use these to compare with plateau meth- 
ods and make some observations relevant to the present 
stochastic approach to the MEM. 
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FIG. 1: Mock time correlation functions Cx shown with their Bayesian fits (solid lines), and the corresponding spectral densities 
p. The sequence of six pairs of figures shows how the spectral fit evolves through a change of the entropy weight a through 15 
orders of magnitude. 



TABLE I: Plateau masses derived from (|3a|-|3d|) on the time 
slice range 6 < t < 18. The entry Ei is the Bayesian result 
with Ai being the peak width (standard deviation) computed 
from the spectral density function p. Statistical errors are 
derived from a gauge configuration jackknife analysis. 



m c s,o 



m e ff,i 



Ei 



Ai 



0.468(8) 0.468(7) 0.468(3) 0.47(2) 0.471(15) 0.017(6) 



In Fig. |^ plots of the mass function discretizations (|3a|— 
3d), built from c(t,to), and the corresponding plateau 
fits are displayed. Plateau fits were made directly to 
Mcff,i=o...3- The resulting masses, other than m c g , are 
from solving (3b -3(1). Table | shows that those are con- 
sistent within statistical (jackknife) errors. 



Figure [l] gives a sense of the annealing dynamics. Be- 
side (E3I) and also shown are 



-<xS)p v 



Y X 2 = (x72W- 



(25) 
(26) 



In and |T^] the use of Ya/ W was advocated as a tuning 
criterion. In view of Fig. |] Y s / X 2 appears to be a better 
choice given its monotonic nature. A target entropy load 
is a safe tuning criterion, provided the 



ofF. 



S/x 2 
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-i±i 



cooling algorithm runs in the (upper) linear region, see 

Fig. |. 

The Bayesian analysis of the time correlation function 
c(i, to) is shown in Fig. |^. The solid line in Fig. W& ) 
derives from the computed spectral density p, via (pa). 
With the exception of to = all available time slices were 
used. Parameters are a = 5.0 x 10~ 5 , for the entropy 
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FIG. 2: Empirical dependence of the entropy load s Yg /w and 
Y S / X 2 on the entropy weight parameter a, see (^3[ ^). These 
results are for the mock correlator Cx(t,to)- The lines indi- 
cate the extent of linear relationships. 
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FIG. 3: Effective mass functions (paj-|3d|) for a single heavy- 
light meson. The horizontal lines are plateau fits in the time 
slice range 6 < t < 18. 



strength, a constant default model m — 1.0 x 10 -12 , and 
a random annealing start about m. The graph of p in 
Fig. ||(b) exhibits a global structure consisting of distinct 
peaks, some broad, and a micro structure of fluctuations 
on the scale of Aw. The micro structure depends on 
details of the annealing process, particularly the start 
configuration. Clearly, it makes no sense to infer the 
micro structure from the data. The reason is that only 
T—l = 29 data points do not contain enough information 
to determine K+—K- + 1 = K = 121 spectral parameters 
(with any sizable probability). 

On the other hand the global structure is a stable fea- 
ture. In the region to > three peaks can be distin- 
guished in Fig. [5|(b). By way of inspection we loosely 
define 

S n = {uj : uj £ peak #n} n = 1, 2 . . . . (27) 
Then, for each peak n, we may calculate the volume Z n , 



FIG. 4: Annealing dynamics in terms of the tuning functions 
Ys/w, Ys/x 2 ' an< ^ ^x 2 ' versus t ne cooling parameter /3w- 
The graphs are labeled with reference to the entropy loads 
( p3| , gj), and (^, |2tj ). This example is for the single-meson 
correlator, with entropy strength a — 5.0 x 10 -5 and a con- 
stant default model m = 1.0 x 10~ 12 . 
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FIG. 5: Time correlation function for a single heavy-light 
meson together with a Bayesian fit (a), and the corresponding 
spectral density function (b). This result stems from a single 
random start, with entropy weight a = 5.0 x 10 -5 , and a 
constant default model m = 1.0 x 10 -12 . 



the mass E n , and the width A„, according to 



Z n = J dujp T {io) 

E n = / dojp T (u))u 



A 2 n = Z- 1 jf dujpr(w)(uj-E T . 



(28) 
(29) 

) 2 ■ (30) 



These integrated, low moment, quantities are evidently 
insensitive to the micro structure. They constitute the 
information that reasonably can be expected to flow from 
the Bayesian analysis. 
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FIG. 6: Spectral density p for a single heavy-light meson, 
same as in Fig. ||(b), but on linear scales, emphasizing the 
ground and the excites states (c) and (d), respectively. The 
uncertainties of Z n , E n , and A n are standard deviations from 
eight annealing runs. 



The spectral density of Fig. ||(b) is replotted in Fig. || 
on linear scales. The tall narrow peak in Fig. ^(c) corre- 
sponds to the plateau masses of Fig. |, as listed in Tab. @. 
There, the entries E\ and Ai are the Bayesian results. 
Their statistical errors are derived from a jackknife anal- 
ysis selecting four subsets of gauge configurations. (Note 
that the uncertainties in Fig. ^| are standard deviations 
from eight annealing starts.) Cold starts from the de- 
fault model to were used to suppress the dependence on 
the annealing start configuration. The peak width Ai is 
comparable to the gauge configuration statistical error. 
This is the exception. With correlation function data 
of lesser quality (like with the two-meson operators be- 
low) the size of the peak width is typically larger than 
the statistical error. It appears that the peak width A„ 
is related to the size O n of the corresponding effective 
mass function plateau, like in Fig. or the size of the 
log-linear stretch in a plot like in Fig. ||(a). As a very 
coarse description A„9„ « const comes to mind. Using 
0i = 12 and Ai = 0.017 we have const w 0.2. The peaks 
77 = 2 and 77 = 3 seen in Fig. ^(d) would thus appear to 
originate from O n « 0.2/A„, or 1.3 and 0.8 time slices, 
respectively. (By inspection of Fig. ^ as many as 5 time 
slices appear involved, however.) The physical relevance 
of, at least, peak n = 3 is therefore questionable. On 
the other hand it is remarkable that the maximum en- 
tropy method is sensitive to the slightest details in the 
correlation function data. 



C. Default model dependence 



The Shannon-Jaynes entropy ( |17| ) implies the possible 
dependence of the computed spectral density p on the 
default model m — {rrik : K- < k < K+}. We explore 
the m dependence using as an example the time correla- 
tion function C v with v\ = 1 v% = 0, in the notation of 
(^), at relative distance r = 4. 



FIG. 7: Correlation function C\\ = (<!>{ (t)$i (t )) of a heavy- 
light meson-meson operator at relative distance r = 4. The 
Bayesian fit (solid line) is from the spectral density p shown 
on the right. At a — 2 x 10~ 6 and constant default model 
m = 1.0 x 10 -12 the spectral density p is obtained from an av- 
erage over eight random annealing start configurations. The 
average entropy load is Y S / X 2 = 0.477 for these runs. 



TABLE II: Averages of volume, energy, and width of the dom- 
inant peak seen in Fig. ^| over the six default model choices 
m = 10~ 12 . . . 10+ 3 at fixed entropy load Y s/x 2 « 0.045. The 
uncertainties are the correspondin g st andard deviations. The 
entry m c ff.o is the plateau mass ( paf) from Fig.[ll] with the 
statistical (jackknife) error, see Sect. 



HE 







Ax 




3923.(18.) 


0.972(3) 
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Figure shows the time correlation function data to- 
gether with the Bayesian fit, and the corresponding spec- 
tral density p. The latter is the average over eight ran- 
dom annealing start configurations. This has the effect 
of smoothing out the micro structure of p. We have used 
a constant default model m& = 1.0 x 10~ 12 , all k. 

The stability of this result is tested by varying the 
default model through 15 orders of magnitude, m = 
10~ 12 . . . 10 +3 , as shown in Fig. |8[ To keep effects of 
the annealing start configuration small cold starts from 
p = to, using the same random seed, were employed for 
all values of m. In each case the entropy strength pa- 
rameter a was tuned such that the entropy load Xs/x 2 
remained constant. Aside from the familiar micro struc- 
ture fluctuations, the global (physical) features are stable 
within the range of, a remarkable, fifteen orders of mag- 
nitude. Numerical experiments with non-constant m do 
not change this assessment. In Tab. [0] are listed the 
three integral quantities (28)-(|3(]) averaged over the six 
default models together with the corresponding standard 
deviations. Their smallness (0.3-3%) attests to the de- 
fault model independence of the Bayesian fits. Given the 
huge variation of the default model the stability of p is 
remarkable. 
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FIG. 8: A sequence of spectral densities p obtained from a 
wide range of constant default models m, see inserts. The 
entropy strength parameter a was tuned to keep the entropy 
load constant, Y S / X 2 w 0.045. The operator is the same as in 
Fig. | 



D. Annealing start dependence 

The annealing algorithm starts with some initial spec- 
tral configuration p m i. Depending on the purpose we 
have used cold starts from the default model, pi n i = m, 
or random starts from the default model, p- ln \.k = Xkirik, 
where the Xk are drawn from a gamma distribution of 
order two, p a (x) — x a ~ 1 e~ x /T(a), a = 2. The global fea- 
tures of the final spectral density are of course indepen- 
dent of the start configuration, but the micro structure 
of p is not. The reason is that in practice the anneal- 
ing process is neither infinitely slow nor is the final cool- 
ing temperature exactly zero. Therefore the anneal- 



FIG. 9: Excited state correlation function C2 of a heavy- 
light meson-meson operator at relative distance r — 4. The 
Bayesian fit (solid line) is from the spectral density p shown 
on the right. At a = 5 x 10~ 7 and constant default model 
m = 1.0 x 10 -12 the spectral density p is obtained from an 
average over eight random annealing start configurations. 



ing result for p settles close to the global minimum, say 
Pmin, of Vt^p]. Considering annealing (thermal) fluctua- 
tions only, we expect the deviation \p — p mm \ to be large 
in directions (of p space) where the minimum is shal- 
low. Thermal fluctuations are easily controlled, however. 
Those were kept negligible in the present study. More 
importantly, there may be local minima close to p m in 
which are only slightly larger than W^Pmin]- This situ- 
ation invites computing a set of spectral densities from 
different, say random, initial configurations. The aver- 
ages and standard deviations of the pk then gives some 
insight into the structure of the peak and the nature of 
the minimum of W and its neighborhood. 

To present an example we have selected an excited 
state time correlation function C'2(t, to) of the meson- 
meson system at relative distance r — 4. C%{t, to) is the 
smaller of the eigenvalues of the 2x2 correlation matrix 
Cij(t,t ) = ($j(t)$j(to)), on each time slice. The rea- 
son for selecting this operator is to see how the MEM 
responds to a data set that is marginally acceptable, at 
best. Figure |^ shows the correlator and the correspond- 
ing spectral density obtained from an average over eight 
Bayesian fits based on different random annealing start 
configurations. The same spectral density is displayed 
in the first frame of Fig. [n] on a linear scale. The dot- 
ted lines represent the limits within one standard devia- 
tion. The remaining three frames of Fig. |l0| show spectral 
functions from selected single start configurations. They 
illustrate the micro structure fluctuations. 

We argue that the micro structure, on a fine discretiza- 
tion scale Aw, is extraneous information. On the basis 
that the number of measured data points, as supplied by 
the time correlation function C„(i,io)> is much smaller 
than the number of inferred parameters pk, exact knowl- 
edge of p would actually constitute an information gain 
not supported by the data. Rather, only averages of suit- 
able observables based on the inferred spectral density, 
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FIG. 10: Spectral densities p of the excited state correlation 
function of Fig. [| The sequence of four frames shows the 
average (ave) over a sample of eight random annealing start 
configurations including the bounds (dotted lines) of one stan- 
dard deviation (±sig), and three selected examples of spectral 
functions making up that sample. 



TABLE III: Averages of volume, energy, and width of the 
dominant peak seen in Figs. 1(] over eight random annealing 
start configurations, at fixed a = 5.0 x 10 -7 and constant 
default model m — 10 -12 . The entry m e ff,o is the plateau 
mass dial) from Fig.[n] with the statistical (jackknife) error, 



see SectTlIIIE 





E x 


Ax 




2156.(11.) 


2.012(7) 


0.214(11) 


1.92(3) 



like (Pq)-(pO|) for example, are relevant information that 
can be extracted from the Bayesian analysis. Whether or 
not the p average of a certain observable is relevant infor- 
mation supported by the data may possibly be decided 
by the criterion that the standard deviation with respect 



III 



to different annealing starts be small. From Tab. 
see that the standard deviations for the small-moment 
averages ( |2^ , |29l , j30|) are comparable to typical gauge con- 
figuration statistical errors, for example those in Tab. Q. 
This should be an acceptable test, certainly high resolu- 
tion operators would fail it. 
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FIG. 11: Effective mass functions /i e ff,o, see (13a), of the cor- 
relator examples Cn and C2 shown in Figs!|] and [)] The 
plateaus are shown as horizontal lines extending over 9 and 2 
time slices, respectively. 



E. Relation to plateau methods 

Aside from the obvious differences in algorithm and 
philosophy it is important to understand that the tradi- 
tional plateau method and the celebrated Bayesian ap- 
proach also are distinctly different in the way they utilize 
the lattice correlator data. First, the former uses data 
on only a (subjectively) truncated contiguous set of time 
slices while completely ignoring the rest, whereas the lat- 
ter utilizes the data on all available time slices without 
bias. Second, in the plateau method the stochastic de- 
pendence of the data between the plateau time slices is 
often ignored [p6[ whereas in the Bayesian approach the 
dependence is fully accounted for through the covariance 
matrix (|i~3|). Hence, the traditional plateau method and 
the Bayesian inference approach cannot be compared on 
an equal footing. In particular, their systematic errors 
are in principle different. 

A comparison of those methods is thus reduced to ob- 
serving their responses to the same data sets. If the nu- 
merical quality of data is very good both methods (in fact 
any two methods) will of course give the same answers. 
An example is the single-meson case discussed above, see 
Tab. In case of imperfect numerical data, however, 
the two methods should be expected to give different re- 
sults. We illustrate this point by showing in Fig. |ll| the 
effective mass functions ([5a]) of the correlators Cn and 
C2 displayed in Figs. ^ and g, respectively. While the 
Cn data are somewhat level within 9 time slices, the C2 
data are extreme in the sense that only 2 data points are 
available to the plateau method. Bayesian inference, as 
illustrated by Fig. [9] and also Fig. [l(], has no problem 
responding with a distinct peak. The reason, of course, 
is that the entire set of correlator data including their 
correlations is available to the Bayesian approach. 

In Tabs. [H] and III we compare the plateau masses 
mefi,o obtained from ([?a|) to the Bayesian results E±. 
The numbers differ by about 3-5%. Note that the statis- 
tical (jackknife) errors on the plateau masses are much 
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smaller. Because of the data truncation the method has 
no way of 'knowing' about the poor quality of the corre- 
lator data, particularly in the C2 case of the exited state 
correlator. The Bayesian method, on the other hand, is 
fully 'aware' of this fact and conveys this information by 
responding with a sizable peak width Ai, which easily 
encompasses the plateau masses. 

This raises the question whether Bayesian peak widths 
or plateau mass statistical errors are a better measure for 
the uncertainty of masses extracted from lattice simula- 
tions. The answer is beyond the scope of this work. 

IV. SUMMARY AND CONCLUSION 

We have reported on our experience using Bayesian 
inference with an entropic prior, the maximum entropy 
method, to extract spectral information from lattice gen- 
erated time correlation functions. The latter were taken 
from a simulation aimed at studying hadronic interac- 
tion, but used here only as a repository of simulation 
data of diverse quality. 

In contrast to other works the method of choice for 
extracting spectral densities was simulated annealing. 

Between the maximum entropy method and simulated 
annealing there were three major concerns about the pa- 
rameter and algorithm dependence of the results: Depen- 
dence on (i) the entropy weight, (ii) the default model, 
and (iii) the annealing start configuration. Besides sug- 
gesting strategies for parameter tuning, independence of 
the Bayesian inferred spectral density p on (i) the entropy 
weight, and (ii) the default model could be demonstrated 



within a range of eight and fifteen orders of magnitude 
of the parameters, respectively. Concerning the anneal- 
ing start configuration dependence (iii) we argued that 
only spectral density averages of certain operators are 
acceptable. From an information theory point of view 
|l9| , those should be operators insensitive to the micro 
structure of the inferred spectral density. In particular, 
keeping in mind that the theoretical structure of the lat- 
tice spectral function is a superposition of distinct peaks, 
those operators include the spectral peak volume Z n , or 
normalization, the peak energy E n , or mass, and the peak 
width A„, or standard deviation. 

Bayesian inference has too long been ignored by the 
lattice community as an analysis tool. It has an advan- 
tage over conventional plateau methods for extracting 
hadron masses from lattice simulations because the en- 
tire information contained in the correlator function, or 
matrix, is utilized. This aspect is particularly important 
where excited state masses are desired, since the noise 
contamination of their signal can be significant. The 
maximum entropy method is very robust with respect to 
changing its parameters. Simulated annealing is practi- 
cal for obtaining spectral density functions. The method 
should be given serious consideration as an alternative 
for conventional ways. 
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